Numpy (numerical arrays for numeric computation)

Numpy is the basic Python module for scientific computing in Python. Its most used object is the multidimensional array. These objects can have any number of dimensions with an efficient storage in the computer's RAM which makes data easy to handle and pass to other libraries. Furthermore, most ot numpy is implemented in C which makes it efficient and fast.

Multidimensional arrays

This is how numpy is usually imported and used to generate an numpy array



In [ ]:

    
import numpy as np



In [ ]:

    
data = [1, 10 , 2, 3, 8.0] # data is a list
a = np.array(data) # a is now a numpy array



In [ ]:

    
type(a)



In [ ]:

    
a

This gives the shape of the array



In [ ]:

    
a.shape

the number of dimensions



In [ ]:

    
a.ndim

the number of elements



In [ ]:

    
a.size

the number of bytes



In [ ]:

    
a.nbytes

The attribute dtype describes the element data type



In [ ]:

    
a.dtype

Creating new arrays

Arrays can be created with nested lists



In [ ]:

    
data = [[0.0, 2.0, 4.0, 6.0], [1.0, 3.0, 5.0, 7.0]]
b = np.array(data)



In [ ]:

    
b



In [ ]:

    
b.shape, b.ndim, b.size, b.nbytes

The function arange is similar to range but it creates an array and not a list



In [ ]:

    
c = np.arange(10) 
c

the function linspace allows for the creation of equally spaced points



In [ ]:

    
e = np.linspace(0.0, 10, 21) # 11 points
e

Similar to matlab, there are also functions like empty, zeros and ones.



In [ ]:

    
np.empty((4,4))



In [ ]:

    
np.zeros((3,3))



In [ ]:

    
np.ones((3,3))

dtype

dtype (for data type) is the attribute with the data type for each element. This data type is usually implicit but can be enforced at the moment of creating the array

For instance, this is implicitly defined as an integer dtype



In [ ]:

    
a = np.array([0, 1, 2, 3])



In [ ]:

    
a, a.dtype

But you could force the creation of a complex array



In [ ]:

    
b = np.zeros((2,2), dtype=np.complex64)
b

or a float array



In [ ]:

    
c = np.arange(0, 10, 2, dtype=np.float)
c

Operations over arrays

Mathematical operations can be performed over the whole array without running a for loop.

For instance



In [ ]:

    
a = np.linspace(0.0, 10.0, 5)
print('a =', a)

b = np.ones(5)
print('b =',b)



In [ ]:

    
a * 2 # every element in the array is multiplied by 2



In [ ]:

    
a + b   #addition works element by element. The same goes for every operation

Slicing

Slicing also works on arrays, only that this time it can be multidimensional



In [ ]:

    
a = np.random.rand(5, 5)#this creates a two dimensional array of random numbers



In [ ]:

    
print(a)

Each dimension has its own index



In [ ]:

    
print(a[0,0], a[0,1]) # first index corresponds to file, the second to columns

to extract the values of a whole column the following syntax can be used



In [ ]:

    
a[:,0] # this is the first column

The last row could be extracted as follows



In [ ]:

    
a[-1,:] #this is the last row

slicing also works in ranges



In [ ]:

    
a[0:2,0:3]

assignation also works with slicing



In [ ]:

    
a[0:2,0:3] = -4.0



In [ ]:

    
a

Exercise 1.1

Create an bidimensional array of random numbers with shape (4,8).

First, set the last column to -1 and then set the second row to 2

Boolean indexing

Arrays can be indexed using other boolean arrays.

For instance consider these two arrays with the age and gender of a set of 10 people



In [ ]:

    
age = np.array([23, 56, 67, 89, 23, 56, 27, 12, 2, 72])
gender= np.array(['m', 'o', 'f', 'f', 'm', 'f', 'm', 'o' ,'m', 'o'])

Suppose that we want to select only the gender of people marked as 'o' (other).

The following statement gives the new boolean array. Each element tells me whether the condition is True or False



In [ ]:

    
ii = (gender == 'o')
print(ii)

Now if we want to have the ages of the people with gender o all I have to do is:



In [ ]:

    
age[ii]

This logic can be extended to different conditions, for instance, let's select the items with age larger than 10 and smaller than 50



In [ ]:

    
ii = (age > 10) & (age < 50) # & is the symbol for the logical AND
print(age[ii])
print(gender[ii])

The following is also a valid syntax



In [ ]:

    
age[age>30]

Exercise 1.2

Using a=np.random.normal(size=1000) generate an array of 1000 thousand random numbers generated from a normal (i.e. gaussian) distribution with mean zero and standard deviation of one.

Print the number of elements with values larger than 2.0. Is this number close to what you expected from the properties of a gaussian distribution?

Universal functions

Universal functions (or ufuncs) are functions that take arrays as inputs and return either arrays or scalar. They are characterized for being fast (implemented in C) and allowing to write simpler python code without using for loops. Here is a list of all universal functions in numpy

For instance one could generate an array of values



In [ ]:

    
t = np.linspace(0.0, np.pi, 10)
print(t)

and the compute the values of the sin function



In [ ]:

    
print(np.sin(t))

Exercise 1.3

Using a=np.random.normal(size=1000) generate an array of 1000 thousand random numbers generated from a normal (i.e. gaussian) distribution with mean zero and standard deviation of one.

Then using only ufuncs on a generate a new array b that is -1 wherever a is negative and 1 wherever a is positive.



In [ ]: